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In this paper we study the problem of recovering sparse or compressible signals from uniformly 
, ^ quantized measurements. We present a new class of convex optimization programs, or decoders, coined 

Basis Pursuit DeQuantizer of moment p (BPDQp), that model the quantization distortion more faithfully 
than the commonly used Basis Pursuit DeNoise (BPDN) program. Our decoders proceed by minimizing 
the sparsity of the signal to be reconstructed subject to a data-fidelity constraint expressed in the ^p-norm 
of the residual error for 2 ^ p ^ oo. 

We show theoretically that, (i) the reconstruction error of these new decoders is bounded if the 
sensing matrix satisfies an extended Restricted Isometry Property involving the £p norm, and (ii), for 
I I Gaussian random matrices and uniformly quantized measurements, BPDQp performance exceeds that 

of BPDN by dividing the reconstruction error due to quantization by y/p~+T. This last effect happens 
with high probability when the number of measurements exceeds a value growing with p, i.e., in an 



oversampled situation compared to what is commonly required by BPDN = BPDQ2. To demonstrate 



cn 

CN the theoretical power of BPDQp, we report numerical simulations on signal and image reconstruction 

problems. 
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I. Introduction 

The theory of Compressed Sensing (CS) 121, Q aims at reconstructing sparse or compressible signals 
from a small number of linear measurements compared to the dimensionality of the signal space. In 
short, the signal reconstruction is possible if the underlying sensing matrix is well behaved, i.e., if it 
respects a Restricted Isometry Property (RIP) saying roughly that any small subset of its columns is 
"close" to an orthogonal basis. The signal recovery is then obtained using non-Unear techniques based on 
convex optimization promoting signal sparsity, such as the Basis Pursuit program |3]. What makes CS 
more than merely an interesting theoretical concept is that some classes of randomly generated matrices 
(e.g., Gaussian, Bernoulli, partial Fourier ensemble, etc) satisfy the RIP with overwhelming probability. 
This happens as soon as their number of rows, i.e., the number of CS measurements, is higher than a 
few multiples of the assumed signal sparsity. 

In a realistic acquisition system, quantization of these measurements is a natural process that Com- 
pressed Sensing theory has to handle conveniently. One commonly used technique is to simply treat the 
quantization distortion as Gaussian noise, which leads to reconstruction based on solving the Basis Pursuit 
DeNoising (BPDN) program (either in its constrained or augmented Lagrangian forms) JH. While this 
approach can give acceptable results, it is theoretically unsatisfactory as the measurement error created 
by quantization is highly non-Gaussian, being essentially uniform and bounded by the quantization bin 
width. 

An appealing requirement for the design of better reconstruction methods is the Quantization Con- 
sistency (QC) constraint, i.e., that the requantized measurements of the reconstructed signal equal the 
original quantized measurements. This idea, in some form, has appeared previously in the literature. 
Near the beginning of the development of CS theory, Candes et al. mentioned that the ^2-norm of 
BPDN should be replaced by the £00 -norm to handle more naturally the quantization distortion of the 
measurements Q. More recently, in |l5l, the extreme case of 1-bit CS is studied, i.e., when only the signs 
of the measurements are sent to the decoder. Authors tackle the reconstruction problem by adding a sign 
consistency constraint in a modified BPDN program working on the sphere of unit- norm signals. In ||6l, an 
adaptation of both BPDN and the Subspace Pursuit integrates an explicit QC constraint. In Q, a model 
integrating additional Gaussian noise on the measurements before their quantization is analyzed and 
solved with a £1 -regularized maximum likelihood program. However, in spite of interesting experimental 
results, no theoretical guarantees are given about the approximation error reached by these solutions. 
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The QC constraint has also been used previously for image and signal processing outside of the CS 
field. Examples include oversampled Analog to Digital Converters (ADC) |,8J, and in image restoration 
problems ll9l. ifTOl. 

In this paper, we propose a new class of convex optimization programs, or decoders, coined the Basis 
Pursuit DeQuantizer of moment p (BPDQp) that model the quantization distortion more faithfully. These 
proceed by minimizing the sparsity of the reconstructed signal (expressed in the ^i-norm) subject to a 
particular data-fidelity constraint. This constraint imposes that the difference between the original and 
the reproduced measurements have bounded ^p-norm, for 2 ^ p ^ oo. As p approaches infinity, this 
fidelity term reproduces the QC constraint as promoted initially in IH. However, our idea is to study, 
given a certain sparsity level and in function of the number of measurements available, which moment 
2 ^ p ^ oo provides the best reconstruction result. 

Our overall result, which surprisingly does not favor p = oo, may be expressed by the principle: Given 
a certain sparsity level, if the number of measurements is higher than a minimal value growing with p, 
i.e., in oversampled situations, by using BPDQp instead ofBPDN - BPDQ2 the reconstruction error due 
to quantization can be reduced by a factor of ^/p~+T. 

At first glance, it could seem counterintuitive to oversample the "compressive sensing" of a signal. 
After all, many results in Compressed Sensing seek to Umit the number of measurements required to 
encode a signal, while guaranteeing exact reconstruction with high probability. However, as analyzed for 
instance in lITTIl . this way of thinking avoids to considering the actual amount of information needed to 
describe the measurement vector. In the case of noiseless observations of a sparse signal. Compressed 
Sensing guarantees perfect reconstruction only for real- valued measurements, i.e., for an infinite number 
of bits per measurements. 

From a rate-distortion perspective, the analysis shown in ifTll . |[T3]| demonstrates also that CS is 
suboptimal compared to transform coding. Under that point of view, the best CS encoding strategy is to 
use all the available bit-rate to obtain as few CS measurements as possible and quantize them as finely 
as possible. 

However, in many practical situations the quantization bit-depth per measurement is pre-determined 
by the hardware, e.g., for real sensors embedding CS and a fixed A/D conversion of the measurements. 
In that case, the only way to improve the reconstruction quality is to gather more measurements. 
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i.e., to oversample the signaQ This does not degrade one of the main interests of Compressed Sensing, 
i.e., providing highly informative Unear signal measurements at a very low computation cost. 

The paper is structured as follows. In Section In) we review the principles of Compressed Sensing and 



previous approaches for accommodating the problem of measurement quantization. Section III introduces 
the BPDQp decoders. Their stability, i.e., the £2 — £1 instance optimality, is deduced using an extended 



version of the Restricted Isometry Property involving the £p-nonn. In Section IV Standard Gaussian 
Random matrices, i.e., whose entries are independent and identically distributed (iid) standard Gaussian, 
are shown to satisfy this property with high probability for a sufficiently large number of measurements. 
Section [V] explains the key result of this paper; that the approximation error of BPDQp scales inversely 
with ^/p~+T. Section VI describes the convex optimization framework adopted to solve the BPDQp 



programs. Finally, Section VII provides experimental vaUdation of the theoretical power of BPDQp on 
1-D signals and on an image reconstruction example. 



II. Compressed Sensing and Quantization of Measurements 

In Compressed Sensing (CS) theory 0, lO, the signal x G to be acquired and subsequently 
reconstructed is typically assumed to be sparse or compressible in an orthogona|^ basis G M^^^ 
(e.g., wavelet basis, Fourier, etc.). In other words, the best K-term approximation xk of x in gives 
an exact (for the sparse case) or accurate (for the compressible case) representation of x even for small 
K < N. For simplicity, only the canonical basis = Id will be considered here. 

At the acquisition stage, x is encoded by m linear measurements (with K ^ m ^ N) provided by 
a sensing matrix ^ G M"*^^, i.e., all known information about x is contained in the m measurements 
{ipi,x) = Y,k^*ik^k, where {^i]^^^ are the rows of ^>. 

In this paper, we are interested in a particular non-ideal sensing model. Indeed, as measurement of 
continuous signals by digital devices always involves some form of quantization, in practice devices 
based on CS encoding must be able to accommodate the distortions in the linear- measurements created 
by quantization. Therefore, we adopt the noiseless and uniformly quantized sensing (or coding) model: 

2/q = Qoc[^x\ = $x + n, (1) 

'Generally, it is also less expensive in hardware to oversample a signal than to quantize measurements more finely. 
^A generalization for redundant basis, or dictionary, exists 1141 . 1151 . 
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where G (aZ + is the quantized measurement vector, = a[(-)i/aj + f is the uniform 

quantization operator in of bin width a, and n = Qo^[^x]^x is the quantization distortion. 

The model ([T]) is a realistic description of systems where the quantization distortion dominates other 
secondary noise sources {e.g., thermal noise), an assumption valid for many electronic measurement 
devices including ADC. In this paper we restrict our study to using this extremely simple uniform 
quantization model, in order to concentrate on the interaction with the CS theory. For instance, this 
quantization scenario does not take into account the possible saturation of the quantizer happening when 
the value to be digitized is outside the operating range of the quantizer, this range being determined 
by the number of bits available. For Compressed Sensing, this effect has been studied recently in |16|. 
Authors obtained better reconstruction methods by either imposing to reproduce saturated measurements 
(Saturation Consistency) or by discarding these thanks to the "democratic" property of most of the 
random sensing matrices. Their work however does not integrate the Quantization Consistency for all 
the unsaturated measurements. The study of more realistic non-uniform quantization is also defened as 
a question for future research. 

In much previous work in CS, the reconstruction of x from yq is obtained by treating the quantization 
distortion n as a noise of bounded power {i.e., ^2-norm) ||n||2 = Xlfc I'^fcP- this case, a robust 
reconstruction of the signal x from corrupted measurements y = <^x + n is provided by the Basis 
Pursuit DeNoise (BPDN) program (or decoder) lITTll : 

A(y, e) = argmin ||u||i s.t. \\y — ^u\\2 ^ £■ (BPDN) 

This convex optimization program can be solved numerically by methods like Second Order Cone 



Programming or by monotone operator splitting methods IITSl . 11191 described in Section VI Notice 
that the noiseless situation e = leads to the Basis Pursuit (BP) program, which may also be solved by 
Linear Programming [201. 

An important condition for BPDN to provide a good reconstruction is \h& feasibility of the initial signal 
X, i.e., we must chose e in the {fidelity) constraint of BPDN such that ||n||2 = — <I>x||2 ^ e. In IITtI . an 
estimator of e for y = yq is obtained by considering n as a random vector ^ G distributed uniformly 
over the quantization bins, i.e., ~iid U{[—^, ^]). 

An easy computation shows then that WiW^ ^ eil'^) ^^^^ probability higher than 1 — e"'^"'^^ for a 
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certain constant cq > (by the Chernoff-Hoeffding bound |[2ll ). where 

elia) ^ EUWl + ^ V^arlieili = f m + ^ ^ mi 

Therefore, CS usually handles quantization distortion by setting e = e2(cx), typically for k = 2. 

When the feasibility is satisfied, the stability of BPDN is guaranteed if the sensing matrix € M*"^^ 
satisfies one instance of the following property: 

Definition 1. A matrix <I> G j^mxTV ^^j/^y^g^. ^/j^ (extended) Restricted Isometry Property (RlPp^q) (with 
p,q > 0) of order K and radius 6k G (0, 1), if there exists a constant /Xp g > such that 

i^P,q (1 - ^i^)^/" ll^^llg ^ W'^uWp ^ fip^g (1 + 6k)^/'^ \\u\\g, (2) 
for all K-sparse signals u G M^. 

In other words, <I>, as a mapping from £™ = {W^, \\-\\p) to £^ = (M^, acts as a (scaled) isometry 
on K-sparse signals of R^. This definition is more general than the common RIP |[22ll . This latter, which 
ensures the stability of BPDN (see Theorem [T] below), corresponds to p = q = 2 in (|2]). The original 
definition considers also normalized matrices = ^/fi2,2 having unit-norm columns (in expectation) so 
that /i2,2 is absorbed in the normalizing constant. 

We prefer to use this extended RIPp,g since, as it will become clear in Section [Vj the case p ^ 2 and 
q = 2 provides us the interesting embedding (|2]) for measurement vectors corrupted by generalized Gaus- 
sian and uniform noises. As explained below, this definition includes also other RIP generalizations Ii26il . 

We note that there are several examples already described in the literature of classes of matrices which 
satisfy the RIPp,(j for specific values of p and q. For instance, for p = q = 2, a matrix ^ £ j^mxTV ^j^j^ 
each of its entries drawn independently from a (sub) Gaussian random variable satisfies this property 
with an overwhelming probability if m ^ cK log N/ K for some value c > independent of the involved 
dimensions |[23i . 1241 . ll25l . This is the case of Standard Gaussian Random (SGR) matrices whose entries 
are iid ~ A/'(0, 1), and of the Bernoulli matrices with <I>jj = ±1 with equal probability, both 
cases having /X2,2 = ^/rn ll23l . Other random constructions satisfying the RIP2,2 are known {e.g., partial 
Fourier ensemble) ||2|, lITTl . For the case p = q = 1 + 0(1)/ log A^, it is proved in ll26l . ETll that sparse 
matrices obtained from an adjacency matrix of a high-quality unbalanced expander graph are RIPp,p (with 
/ipp = 1/ {1 — 5k)). In the context of non-convex signal reconstruction, the authors in ll28l show also that 
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Gaussian random matrices satisfy the Restricted ]5-Isometry, i.e., RIPp,^ for q = 2, < p < 1, /ip_2 = 1 
and appropriate redefinition of 6k- 

The following theorem expresses the announced stability result, i.e., the £2 — ii instance optimalitjj^ 
of BPDN, as a consequence of the RIP2,2- 



Theorem 1 ( II22II ). Let x G be a signal whose compressibility is measured by the decreasing of the 
K-term ii-approximation error eo{K) = K~2 \\x — xk\\i, for ^ K ^ N, and xk the best K-term 
£2-approximation of x. Let ^ be a RIP2,2 matrix of order 2K and radius < 52K < — 1. Given a 
measurement vector y = <I>x + n corrupted by a noise n with power \\n\\2 ^ e, the solution x* = A{y, e) 
obeys 



\x*-x\\2 ^ AeoiK) + Bj^^, (3) 



for A(<^>,K) = 2 and B(^,K) = ^ ■ For instance, for 52k = 0.2, A < 4.2 and 

B < 8.5. 

Let us precise that the theorem condition 62K < V2 — 1 on the RIP radius can be refined (like in 
IIBTI ). We know nevertheless from Davies and Gribonval ||32l that ^1 -minimization will fail for at least 
one vector for 52k > 1/ V2- The room for improvement is then very small. 



Using the BPDN decoder to account for quantization distortion is theoretically unsatisfying for several 
reasons. First, there is no guarantee that the BPDN solution x* respects the Quantization Consistency, 
i.e., 

Q„[$X*]=yq ^ \\y^-^x*\\oo^1, (QC) 

which is not necessarily implied by the BPDN £2 fidelity constraint. The failure of BPDN to respect QC 
suggests that it may not be taking advantage of all of the available information about the noise structure 
in the measurements. 

Second, from a Bayesian Maximum a Posteriori (MAP) standpoint, BPDN can be viewed as solving 
an ill-posed inverse problem where the £2 -norm used in the fidelity term corresponds to the conditional 
log-likelihood associated to an additive white Gaussian noise. However, the quantization distortion is not 
Gaussian, but rather uniformly distributed. This motivates the need for a new kind of CS decoder that 
more faithfully models the quantization distortion. 

^Adopting the definition of mixed-norm instance optimality 1291 . 
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III. Basis Pursuit DeQuantizer (BPDQp) 

The considerations of the previous section encourage the definition of a new class of optimization 
programs (or decoders) generaUzing the fideUty term of the BPDN program. 

Our approach is based on reconstructing a sparse approximation of x from its measurements y = $x+n 
under the assumption that £p-norm (p ^ 1) of the noise n is bounded, i.e., \\n\\p = \nk\'^ ^ for 
some e > 0. We introduce the novel programs 

Ap(y, e) = argmin ||n||i s.t. \\y — ^u\\p ^ e. (BPDQp) 

The fidelity constraint expressed in the £p-norm is now tuned to noises that follow a zero-mean Generalized 
Gaussian Distributiorj^ (GGD) of shape parameter p ||30l . with the uniform noise case corresponding to 

p — >• oo. 

We dub this class of decoders Basis Pursuit DeQuantizer of moment p (or BPDQp) since, for reasons 
that wiU become clear in Section |Vj their approximation error when is uniformly quantized has an 
interesting decreasing behavior when both the moment p and the oversampling factor m/K increase. 
Notice that the decoder corresponding to p = 1 has been previously analyzed in |[33l for Laplacian noise. 

One of the main results of this paper concerns the £2 — h instance optimality of the BPDQp decoders, 
i.e., their stability when the signal to be recovered is compressible, and when the measurements are 
contaminated by noise of bounded £p-norm. In the following theorem, we show that such an optimality 
happens when the sensing matrix respects the (extended) Restricted Isometry Property RIPp,2 for 2 ^ 

p < 00. 

Theorem 2. Let x € be a signal with a K-term £i-approximation error eQ{K) = \\x — xk\\i, 
for ^ K ^ N and xk the best K-term i2-approximation of x. Let ^ be a RIPp^2 matrix on s sparse 
signals with constants 6s, for s £ {K, 2K, 3K} and 2 ^ p < 00. Given a measurement vector y = <I>x + n 
corrupted by a noise n with bounded ip-norm, i.e., ||n||p ^ e, the solution x* = Ap{y,e) of BPDQp 
obeys 

||2;p-x||2 ^ Apeo{K) + Spe//Xp,2, 

for values Ap{^, K) = ^^j^^^^^^Ig^^ Sp($, K) = j^^^^+f^g^ , and Cp = Cp($, 2K, K) given in the proof 
of Lemma |2] (Appendix [P] ). 

''The probability density function / of such a distribution is fix) oc exp(— |a::/6|'') for a standard deviation a cc b. 
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As shown in Appendix |Ej this theorem follows from a generalization of the fundamental result proved 
by Candes 122)1 to the particular geometry of Banach spaces £p. 

IV. Example of RIPp,2 Matrices 

Interestingly, it turns out that SGR matrices $ G ]g™x^ also satisfy the RIPp,2 with high probability 
provided that m is sufficiently large compared to the sparsity K of the signals to measure. This is made 
formal in the following Proposition, for which the prool^ is given in Appendix [a] 

Proposition 1. Let ^ £ ]^mxN ^ Standard Gaussian Random (SGR) matrix, i.e., its entries are iid 
AA(0, 1). Then, if m ^ {p — 1)2^^^ for 2 ^ p < oo and m ^ 0/or p = oo, there exists a constant c > 
such that, for 

Gpim) ^ c<^-2(Klog[ef (1 + 125-1)]+ log I), (4) 

with Qp{m) = m?^^ for 1 ^ p < oo and Qp{m) = log m for p = oo, ^ is RIPp.2 of order K and radius 
6 with probability higher than 1 — r]. Moreover, the value fip^2 = ^\\S,\\p the expectation value of the 
£p-norm of a SGR vector £, e M™. 

Roughly speaking, this proposition tells us that to generate a matrix that is RIPp.2 with high probability, 
we need a number of measurements m that grows polynomially in K log N/K with an "order" p/2 for 
2 ^ p < oo, while the limit case p = oo grows exponentially in K log N/K. 

Notice that an asymptotic estimation of ^p 2> i-^-, for m — oo, can be found in ll34l for 1 ^ p < oo. 
However, as presented in the following Lemma (proved in Appendix |C]), non-asymptotic bounds for 
/ip,2 = IE II lip can be expressed in terms of 

with g ~ iV(0, 1) and i/^ = E\g\P = 2^ tt^s r(^). 
Lemma 1. If G is a SGR vector, then, for 1 ^ p < oo, 

(1 + '-^K\nmf^ ^ mwp ^ imrpf^- 

^Interestingly, this proof siiows also that SGR matrices are RIPp,2 with high probability for 1 < p < 2 when m exceeds a 
similar bound to (Bl. 
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In particular, as soon as m 2^+^ for 13 0, E||^||p ^ (E||C||^)p (l+/3)^"^ ^ (1-2^^). 

For p = oo, there exists a p > such that ^/logm ^ E||^||oo ^ p \/log m. 

An interesting aspect of matrices respecting the RIPp,2 is that they approximately preserve the decor- 
relation of sparse vectors of disjoint supports. 

Lemma 2. Let u,v € M'^ with \\u\\o = s and \\v\\o = s' and supp(n) n supp(v) = 0, and 2 ^ p < oc. 
If ^ is RIPp^2 of order s + s' with constant 6s+s', ond of orders s and s' with constants 6s and 6s', then 

\{J{^U),<^V)\ ^ P'l2Cp\M2\\v\\2, (5) 

with (J(u))j = ||n||p~2 |itj|2~"^ signuj and Cp = Cp{^, s, s') is given explicitly in Appendix |d| 

It is worth mentioning that the value Cp behaves as ^y {6s + 6s+s') (1 + ^s') ~ 2) for large p, and 
as 6s+s' + |(1 + ^s+s'){p — 2) for p ~ 2. Therefore, this result may be seen as a generalization of the 
one proved in ll22l (see Lemma 2.1) for p = 2 with C2 = 6s+s'- As shown in Appendix [Pj this Lemma 
uses explicitly the 2-smoothness of the Banach spaces £p when p ^ 2 ll35l . |[36l . in connection with the 
normalized duality mapping J that plays a central role in the geometrical description of £p. 

Lemma [2] is at the heart of the proof of Theorem [2} which prevents the later from being valid for 
p = 00. This is related to the fact that the £00 Banach space is not 2-smooth and no duality mapping 
exists. Therefore, any result for p = 00 would require different tools than those developed here. 

V. BPDQp AND Quantization Error Reduction 

Let us now observe the particular behavior of the BPDQp decoders on quantized measurements of a 
sparse or compressible signal assuming that a is known at the decoding step. In this Section, we consider 
that p ^ 2 everywhere. 

First, if we assume in the model ([T]) that the quantization distortion n = Qa[^x] — $x is uniformly 
distributed in each quantization bin, the simple Lemma below provides precise estimator e for any £p-norm 
of n. 

Lemma 3. If £ is a uniform random vector with ^■iid U{[—^^ ^]), then, for 1 ^ p < 00, 

^p=nm = 2^)^- (6) 

In addition, for any k > 0, IP[||'^||p ^ Cp + V^] ^ e^^'^^ while, limp^oo(Cp + I'^^V^)^ = -f • 

10 



The proof is given in Appendix |F] 

According to this result, we may set the ^p-norm bound e of the program BPDQp to 

e = ep{a) = ^{p+iy/p {m + k{p + 1) ^)~^ , (7) 

so that, for k = 2, we know that x is a feasible solution of the BPDQp fidelity constraint with a probability 
exceeding 1 - e"*^ > 1 - 3.4 x 10"^. 

Second, Theorem [2] points out that, when ^> is RIPp,2 with 2 ^ p < oo, the approximation error of 
the BPDQp decoders is the sum of two terms: one that expresses the compressibility error as measured 
by eo(if), and one, the noise error, proportional to the ratio e//ip^2- In particular, by Lemma [T| for m 
respecting Q, a SGR sensing matrix of m rows induces with a controlled probability 

\\x-xl\\2 ^ ApeoiK) + Bp^-^. (8) 

Combining ([T]) and the result of Lemma [T] we may bound the noise error for uniform quantization more 
precisely. Indeed, for 2 ^ p < oo, if m ^ (p — 1)2^+^, /ip,2 ^ ^^p"^^ with Up = \f2Ti^~p r(2±i)p. 

In addition, using a variant of the Stirling formula found in ll37ll . we know that |r(x) — ^ 
h (^)' (f fo"" ^ ^ 1- Therefore, we compute easily that, for x = {p + l)/2 > 1, Up ^ c^/p i^)^^"^ ^ 

U/2 „ntU — 

9v^ 



g (2+1)1/2 ^-^^ g ^ ^ ^ ]^ Finally, by (7i, we see that. 



with C = 9e/(8^/2) < 2.17, where we used the bound ^ ^ 2 and the fact that + k -^Y^^ < 1 
if m > (2±1k)2 = 0(1). 

In summary, we can formulate the following principle. 

Oversampling Principle. The noise error term in the I2 — H-i instance optimality relation in the case 
of uniform quantization of the measurements of a sparse or compressible signal is divided by \Jp + 1 
in oversampled SGR sensing, i.e., when the oversampling factor m/K is higher than a minimal value 
increasing with p. 

Interestingly, this follows the improvement achieved by adding a QC constraint in the decoding of 
oversampled ADC signal conversion HI. 
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The oversampling principle requires some additional explanations. Taking a SGR matrix, by Proposition 
[T| if nip is the smallest number of measurements for which such a randomly generated matrix <I> is RIPp,2 
of radius 6p < 1 with a certain nonzero probability, taking m > nip allows one to generate a new random 
matrix with a smaller radius 5 < 5p with the same probability of success. 

Therefore, increasing the oversampling factor ni/ K provides two effects. First, it enables one to hope 
for a matrix $ that is RIPp,2 for high p, providing the desired error division by ^/p~+l. Second, as shown 
in Appendix [b] since 5 = 0(m~^/P ylogm), oversampling gives a smaller 5 hence counteracting the 
increase of p in the factor Cp of the values Ap ^ 2 and Bp ^ 4. This decrease of 6 also favors BPDN, 
but since the values A = A2 ^ 2 and B = B2 ^ 4: in ^ are also bounded from below this effect is 
limited. Consequently, as the number of measurements increases the improvement in reconstruction error 
for BPDN will saturate, while for BPDQp the error will be divided by y^jT+T. 

From this result, it is very tempting to choose an extremely large value for p in order to decrease the 
noise error term ([8]l. There are however two obstacles with this. First, the instance optimality result of 
Theorem [2] is not directly valid for p = 00. Second, and more significantly, the necessity of satisfying 
RIPp 2 implies that we cannot take p arbitrarily large in Proposition [T] Indeed, for a given oversampling 
factor ni/K, a SGR matrix <I> can be RIPp,2 only over a finite interval p e [2,j)max]- This implies that 
for each particular reconstruction problem, there should be an optimal maximum value for p. We will 



demonstrate this effect experimentally in Section VII 



We remark that the compressibility error is not significantly reduced by increasing p when the number 
of measurements is large. This makes sense as the £p-norm appears only in the fidelity term of the 
decoders, and we know that in the case where e = the compressibility error remains in the BP decoder 
|[22l . Finally, note that due to the embedding of the £p-norms, i.e., \\-\\p ^ ||-||p' ii p^ p' ^ 1, increasing 
p until Pmax makes the fidelity term closer to the QC. 



VI. Numerical Implementation 

This section is devoted to the description of the convex optimization tools needed to numerically solve 
the Basis Pursuit DeQuantizer program. While we generally utilize p ^ 2, the BPDQp program is convex 
for p ^ 1. In fact, the efficient iterative procedure we describe will converge to to the global minimum 
of the BPDQp program for all p ^ 1. 
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A. Proximal Optimization 

The BPDQp (and BPDN) decoders are special case of a general class of convex problems lITSl . |[38ll 

argmin + /2(x), (P) 

where T-L = is seen as an Hilbert space equipped with the inner product {x, z) = ^ ■ XjZj. We 
denote by dom/ = {x ^ T-L : f{x) < +00} the domain of any / : "H — M U {+00}. In (P), the 
functions /i, /2 : ?^ — )■ M U {+00} are assumed (i) convex functions which are not infinite everywhere, 
i.e., dom /i, dom /2 7^ 0, (ii) dom/i n dom/2 7^ 0, and (Hi) these functions are lower semi-continuous 
(Isc) meaning that lim inf ^^-j.^^^ /(x) = /(xq) for all xq G dom/. The class of functions satisfying these 
three properties is denoted ro(M^). For BPDQp, these two non-differentiable functions are /i(x) = ||x||i 
and f2{x) = ^TT(e){x) = if x G T*'(e) and 00 otherwise, i.e., the indicator function of the closed convex 
set TP{e) = {x G : \\y^ - $x||p ^ e}. 

It can be shown that the solutions of problem (P) are characterized by the following fixed point 
equation: x solves (P) if and only if 

x = [1 + /3dih + f2))~\x), for p>0. (10) 

The operator J'jSdf = (1 is called the resolvent operator associated to the subdifferential 
operator df, /3 is a positive scalar known as the proximal step size, and 1 is the identity map on Ti. We 
recall that the subdifferential of a function / G ToiTi) at x G is the set-valued map df{x) = {u G 
Ti i^z £ Ti, f{z) ^ /(x) + {u, z — x)}, where each element u of df is called a subgradient. 

The resolvent operator is actually identified with the proximity operator of i.e., Jpdf = prox^j, 
introduced in [39 1 as a generalization of convex projection operator. It is defined as the unique solution 
proxj(x) = argmin^g-^ ^||z — x\\\ + f{z) for / G ro(^). If f = ic for some closed convex set 
C C T-L, prox^(x) is equivalent to orthogonal projection onto C. For /(x) = ||x||i, prox^j(x) is given 
by component- wise soft-thresholding of x by threshold 7 ifTSll . In addition, proximity operators of Isc 
convex functions exhibit nice properties with respect to translation, composition with frame operators, 
dilation, etc. BOl. |[38l. 

In problem (P) with / = /i + /2, the resolvent operator J^pof = typically cannot be 

calculated in closed-form. Monotone operator splitting methods do not attempt to evaluate this resolvent 
mapping directly, but instead perform a sequence of calculations involving separately the individual 
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resolvent operators Jpaj^ and Jj^Qf^. The latter are hopefully easier to evaluate, and this holds true for 
our functionals in BPDQp. 

Since for BPDQp, both /i and /2 are non-differentiable, we use a particular monotone operator splitting 
method known as the Douglas-Rachford (DR) splitting. It can be written as the following compact 
recursion formula ifTSl 




where A® = 2A — 1 for any operator A, at G (0, 2) for all t € N, S^y = prox^^^ is the component- 
wise soft-thresholding operator with threshold 7 > and 'PTp(e) = the orthogonal projection 
onto the tube TP{e). From ||T9|| . one can show that the sequence (a;^*))^^^ converges to some point x* 
and VTp(e){x*) is a solution of BPDQp. In the next Section, we provide a way to compute 'PTp{e){^*) 
efficiently. 



B. Proximity operator of the £p fidelity constraint 

Each step of the DR iteration ( [TT] ) requires computation of proxy^ = VTp{e) for TP{e) = G : 
||yq — ^x\\p ^ e}. We present an iterative method to compute this projection for 2 ^ p ^ cxd. 

Notice first that, defining the unit Ip ball BP = {y G : ||y||p ^ 1} C we have 
with the affine operator A^{x) = ^{^x — yq). 
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The proximity operator of a pre-composition of a function / G ^oiH) with an affine operator can be 
computed from the proximity operator of /. Indeed, let G ]^irixN jj^g affine operator A{x) = ^'x—y 
with y G M™. If is a tight frame of H, i.e., ^'^'* = cl for some c > 0, we have 

proxjo^(x) = X + c~^^'* [prox^j — l){A{x)) , 

Boll . ifTSll . Moreover, for a general bounded matrix we can use the following lemma. 



Lemma 4 dHH). Let G M'"x^ ^ mafnx w/f/i bounds ^ ci < C2 < oo 5mc/i f/iaf ci 1 ^ ^ 
C2 1 let {l3t}teN ci sequence with < infj fit ^ sup^ fit < 2/c2. Define 

//' f/je matrix <!>' /i' a general frame of %, i.e., < ci < C2 < oo, f/ien / o A G ro('H). /« addition, 
u^*) — 7- -u G M™ fine? p^*-* —7- proxjo^(x) = x — (^'*u in (12). More precisely, both u^^^ and p^*-* converge 



linearly and the best convergence rate is attained for (it = 2/(ci+C2) with \\u^^^ —u\\ ^ (^Jqif^) \\u^'^'^ 



-u 



Otherwise, if <!>' is just bounded (i.e., ci = < C2 < ooj, and if f o A £ To{'H), apply ( |12| ), ancf 
u''*'' —7- u and p*-*^ — )• proxjo^(x) = x — ^'*u at the rate 0{l/t). 

In conclusion, computing proxj^ may be reduced to applying the orthogonal projection proXj^^, = Vbp 



by setting f = ibp, ^' = ^/e and y = y^/e inside the iterative method ( 12 1 with a number of iterations 
depending on the selected application (see Section |VII| ). 

For p = 2 and p = oo, the projector Vbp has an explicit form. Indeed, if y is outside the closed unit 
£p-ball in M'", then VB<y) = and {VB^{y))^ = sign(yi) x min{l, \yi\} for 1 ^ i ^ m. 

Unfortunately, for 2 < p < cxd no known closed-form for the projection exists. Instead, we describe 
an iterative method. Set fy{u) = ^11"" — y\\2 and g{u) = \\u\\p. 

If ||y||p ^ 1, VBp{y) = y- For ||y||p > 1> the projection Vbp is the solution of the constrained 
minimization problem u* = aigmiuu fy{u) s.t. g{u) = 1. Let L{u,X) be its Lagrange function (for 

A G M) 

L{u,X) = fy{u) + X{g{u)-l). (13) 

Without loss of generality, by symmetry, we may work in the positiv^ orthant Ui ^ and yi ^ 0, since 
the point y and its projection u* belong to the same orthant of M™, i.e., yiu* ^ for all 1 ^ i ^ m. 

*The general solution can be obtained by appropriate axis mirroring. 
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As fy and g are continuously differentiable, the Karush-Kuhn- Tucker system corresponding to (13 1 is 

V„LK, A'^) = Vufyiu') + A*V„5K) = 

(14) 

where the solution u* is non-degenerate by strict convexity in u BTI . and A* the corresponding Lagrange 
multiplier. 

Let us write z = (n, Zm+i = A) G M^+i and F = V^L : ]R™+i ^ ]R™+i as 



Zi + p Zm+i zf ^ - Vi if i ^ m, 



(Er=i^J)-l ifi = m+l. 

The KKT system (14i is equivalent to F{z*) = 0, where the desired projection u* is then given by 
the first m coordinates of z*. This defines a system of m + 1 equations with m + 1 unknowns {u*, A*) 
that we can solved efficiently with the Newton method. This is the main strategy underlying sequential 
quadratic programming used to solve general-type constrained optimization problems iHTI . 

Given an initialization point z^, the successive iterates are defined by 

= z"-y(z")-iF(z"), (15) 

where Vij = is the Jacobian associated to F. If the iterates sequence {z^)n^Q is close enough 
to (u*,A*), we known that the Jacobian is nonsingular as u* is non-degenerate. Moreover, since that 
Jacobian has a simple block-invertible form, we may compute ( H2l . p. 125) 



v-\z) = {z^..-b-u)i\ ^^^^ 

y {h^U - Zrn+l) J 

where D € M™x™ is a diagonal matrix with Dii{z) = 1 + p{p-l)zm+izf'^ , 6 G with bi{z) = pzf~'^ 
for 1 ^ i ^ m, 6 = D^^b, (i = b^D^^b = IFD b. This last expression can be computed efficiently as D 
is diagonal. 

We initialize the first m components of z^ by the direct radial projection of y on the unit ^p-ball. 

In summary, to compute Vbp, we run ( [T5| ) using ( [76] ) to calculate each update step. We terminate 
the iteration when the norm of ||F(z")||2 falls below a specified tolerance. Since the Newton method 
converges superlinearly, we obtain error comparable to machine precision with typically fewer than 10 
iterations. 



'° ~ y/\\y\\p^ and initialize z^^^ = argminA A)||2. 



16 



VII. Experiments 

As an experimental validation of the BPDQp method, we ran two sets of numerical simulations for 
reconstructing signals from quantized measurements. For the first experiment we studied recovery of 
exactly sparse random 1-D signals, following very closely our theoretical developments. Setting the 
dimension = 1024 and the sparsity level K = 16, we generated 500 A'-sparse signals where the 
non-zero elements were drawn from the standard Gaussian distribution M{0, 1), and located at supports 
drawn uniformly in {1, • • • , N}. For each sparse signal x, m quantized measurements were recorded as 
in model ^ with a SGR matrix G R™^^. The bin width was set to a = ||<I)x||oo/40. 

The decoding was accomplished with BPDQp for various moments p ^ 2 using the optimization 



algorithm described in Section |Vl] In particular, the overall Douglas-Rachford procedure ([TTjl was run 
for 500 iterations. At each DR step, the method in ([12]) was iterated until the relative error 



Ib<"ll2 

fell below 10^^; the required number of iterations was dependent on m but was fewer than 700 in all 
cases examined. 

In Figure [Tj we plot the average quality of the reconstructions of BPDQp for various values of p ^ 2 and 

11x11 

m/K G [10,40]. We use the quality measure SNR(x;a;) = 201og]^o jjib^' where x is the true original 
signal and x the reconstruction. As can be noticed, at higher oversampling factors m/K the decoders 
with higher p give better reconstruction performance. Equivalently, it can also be observed that at lower 
oversampling factors, increasing p beyond a certain point degrades the reconstruction performance. These 
two effects are consistent with the remarks noted at the end of Section [Vj as the sensing matrices may 
fail to satisfy the RIPp,2 if P is too large for a given oversampling factor. 

One of the original motivations for the BPDQp decoders is that they are closer to enforcing quantization 
consistency than the BPDN decoder. To check this, we have examined the "quantization consistency 
fraction", i.e., the average fraction of remeasured coefficients (^x)i that satisfy — yi\ < These 

are shown in Figure [T](c) for various p and m/K. As expected, it can be clearly seen that increasing p 
increases the QC fraction. 

An even more explicit illustration of this effect is afforded by examining histograms of the normalized 
residual oc'^{^x — y)i for different p. For reconstruction exactly satisfying QC, these normalized residuals 
should be supported on [—1/2, 1/2]. In Figure |2] we show histograms of normalized residuals for p = 2 
andp = 10, for the case m/K = 40. The histogram for p = 10 is indeed closer to uniform on [—1/2, 1/2]. 

For the second experiment, we apply a modified version of the BPDQp to an undersampled MRI 
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Fig. 2. Histograms of a ^("l>x — y)i. Left, p — 2. Right, p = 10. 



reconstruction problem. Using an example similar to [43], the original image is a 256 x 256 pixel 
"synthetic angiogram", i.e., N = 256^, comprised of 10 randomly placed non-overlapping ellipses. 
The linear measurements are the real and imaginary parts of a fraction p of the Fourier coefficients 
at randomly selected locations in Fourier space, giving m = pN independent measurements. These 
random locations form the index set 17 C {1, • • • , A^} with = m. Experiments were carried out with 
p € {1/6, 1/8, 1/12}, but we show results only for p = 1/8. These were quantized with a bin width 
a = 50, giving at most 12 quantization levels for each measurement. 



For this example, we modify the BPDQp program III by replacing the £i term by the total variation 
(TV) semi-norm fl4l . This yields the problem 



argmin s.t. H?/ — $m||p ^ e, 

u 

where <I> = is the restriction of Discrete Fourier Transform matrix F to the rows indexed in Q,. 



This may be solved with the Douglas-Rachford iteration ( |TT] ), with the modification that be replaced 
by the proximity operator associated to 7 times the TV norm, i.e., by prox^||.||^^(y) = argmin^ — 
ti|P + 7||ti||TV- The latter is known as the Rudin-Osher-Fatemi model, and numerous methods exist for 
solving it exactly, including P3]| . ll46l . BTll . BSl . In this work, we use an efficient projected gradient 
descent algorithm on the dual problem, see e.g., \ 1 8 1 . Note that the sensing matrix Fq is actually a tight 



frame, i.e., FqFX = 1, so we do not need the nested inner iteration ( 12). 



We show the SNR of the BPDQp reconstructions as a function of p in Figure [3| averaged over 50 
trials where both the synthetic angiogram image and the Fourier measurement locations are randomized. 
This figure also depicts the SNR improvement of BPDQp -based reconstruction over BPDN. For these 



simulations we used 500 iterations of the Douglas-Rachford recursion (111. This quantitative results are 
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Fig. 3. Average SNR (solid) and SNR improvement over BPDN (dashed) as a function of p, for the synthetic angiogram 
reconstruction simulations. Error bars indicate 1 standard deviation. 

confirmed by visual inspection of Figure|4j where we compare 100 x 100 pixel details of the reconstruction 
results with BPDN and with BPDQp for p = 10, for one particular instance of the synthetic angiogram 
signal. 

Note that this experiment lies far outside of the justification provided by our theoretical developments, 
as we do not have any proof that the sensing matrix Fq satisfies the RIPp_2, and our theory was developed 
only for ii synthesis-type regularization, while the TV regularization is of analysis type. Nonetheless, we 
obtain results analogous to the previous 1-D example; the BPDQp reconstruction shows improvements 
both in SNR and visual quality compared to BPDN. These empirical results suggest that the BPDQp 
method may be useful for a wider range of quantized reconstruction problems, and also provoke interest 
for further theoretical study. 

VIII. Conclusion and Further Work 

The objective of this paper was to show that the BPDN reconstruction program commonly used in 
Compressed Sensing with noisy measurements is not always adapted to quantization distortion. We 
introduced a new class of decoders, the Basis Pursuit DeQuantizers, and we have shown both theoretically 
and experimentally that BPDQp exhibit a substantial reduction of the reconstruction error in oversampled 
situations. 

A first interesting question for further study would be to characterize the evolution of the optimal 
moment p with the oversampling ratio. This would allow for instance the selection of the best BPDQp 
decoder in function of the precise CS coding/decoding scenario. Second, it is also worth investigating 
the existence of other RIPp,2 random matrix constructions, e.g., using the Random Fourier Ensemble. 
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Third, a more realistic coding/decoding scenario should set a theoretically in function of the bit budget 
(rate) available to quantize the measurements, of the sensing matrix and of some a priori on the signal 
energy. This should be linked also to the way our approach can integrate the saturation of the quantized 
measurements llT6l . Finally, we would like to extend our approach to non-uniform scalar quantization 
of random measurements, generalizing the quantization consistency and the optimization fidelity term to 
this more general setting. 
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Appendix A 
Proof of Proposition [T] 

Before proving Proposition [T] let us recall some facts of measure concentrations B9ll . ll50l . 
In particular, we are going to use the concentration property of any Lipschitz function over M™, i.e., F 
such that ||F||Lip = sup„_^gR„^ '^yt^rJUf'" < oo- If ll-^^llup ^ 1, F is said 1-Lipschitz. 

Lemma 5 (Ledoux, Talagrand |49| (Eq. 1.6)). If F is Lipschitz with A = ||-F||Lip. then, for the random 
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vector ^ G M"" with ~iid AA(0, 1), 

>r] ^ 2e-5'''^"', forr>0, 
with = EF{C) = /r™ F{x) 7™(x) d™x a«t/ 7'"(x) = (27r)-™/2 e-H^H^/^. 

A useful tool that we will use is the concept of a net. An e-net (e > 0) of yl C is a subset 5 of 
A such that for every t € A, one can find s G 5 with ||t — s||2 ^ £■ In certain cases, the size of a e-net 
can be bounded. 

Lemma 6 ( II50II ). There exists a e-net S of the unit sphere ofK^ of size \S\ ^ (1 + |)^. 
We will use also this fundamental result. 

Lemma 7 ( II50II ). Let S be a e-net of the unit sphere in R^. Then, if for some vectors ui, • • • , vk in the 
Banach space B normed by \\-\\b, we have 1 — e ^ || Xlili ■^j^* ||b ^ 1 + ^ f^^ all s = {si, ■ ■ ■ , sk) ^ 
S C M^, then 

K 

(l-/3)||t||2 ^WY^^iViW^ ^ (1 + /3) ||t||2, 
i=l 

for all t G R^, with /3 = 

In our case, the Banach space is £^ = (M*", || • ||p) for 1 ^ p ^ oo, i.e M™ equipped with the norm 
||u||p = With all these concepts, we can now demonstrate the main proposition. 

Proof of Proposition^ Let p ^ I. We must prove that for a SGR matrix <^ G M"*^^, /.e., with 
^ij ^iid A/'(0, 1), with the right number of measurements m, there exist a radius < (5 < 1 and a 
constant /ip,2 > such that 

Aip,2 Vl - 5 ||x||2 ^ ||*^*a^||p ^ /Up,2 \/l + (5 Iklb, (17) 

for all X G with ||x||o ^ K. 

We begin with a unit sphere S't = G : suppu = T, ||n||2 = 1} for a fixed support T C 
{I,-- - ,N} of size |r| = K. Let 5^ be an e-net of St- We consider the SGR random process that 
generates <I> and, by an abuse of notation, we identify it for a while with <I> itself. In other words, 
we define the random matrix = ,$Af) G M™^^ where, for all 1 ^ i ^ iV, G 

is a random vector of probability density function (or pdf) 7™(u) = n™^]^7(nj) for u G and 
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7(ui) = -^7^^ "'^^ (the standard Gaussian pdf). Therefore, ^ is related to the pdf 7<i.(i;^>) = n^j^7'^(i;^)j), 

Since the Frobenius norm = (Xljfc I'/'jfcP)^''^ of ^i^^ the pdf 7<i)(0) oc e^W^W^I'^ are invariant 

under a global rotation in of all the rows of 0, it is easy to show that for unit vector s G M^, 
P$[|F($s) -//fI > r] = P$[|F(^>i) -//fI > t] ^ 2e"2''^"', using Lemma [s] on the SGR vector ^>i. 

The above holds for a single s. To obtain a result valid for all s G 5r we may use the union bound. 
As \St\ ^ (1 + by Lemma|6| setting r = e^ip for e > 0, we obtain 

for all s ^ St- 

Taking now F(-) = ||-||p for 1 ^ p ^ oo, we have fj,p = /ip^2 = for a SGR vector ^ G R™. The 

Lipschitz value is A = Ap = 1 for p ^ 2, and A = Ap = m for 1 ^ p ^ 2. Consequently, 

(1-e) ^ ||^<J>s||p ^(1 + e), (18) 



for all s £ St, with a probabiUty higher than 1-2 exp(ivriog(l + 2e ) — fJ'p^2\ )■ 
We apply Lemma [7] by noting that, as s has support of size K, ( [TS] ) may be written as 

1=1 

where Vi G are the columns of jj^^ corresponding to the support of s (we abuse notation to let 
Si range only over the support of s). Then according to Lemma [7] we have, with the same probability 
bound and for {y/2 - 1)5 = 

Vl-d\\x\\2 ^ (1 - (\/2 - ||x||2 ^ ||^>x||p 

^ {1 + {V2 - 1)6) \\x\\2 ^ Vl + 6\\x\\2, (19) 

for all X G with supp x = T. 

The result can be made independent of the choice of T C {1, • • • , N} by considering that there are 



^ {eN/K)^ such possible supports. Therefore, applying again an union bound, (19 1 holds for all 



\K 

ir-sparse x in with a probability higher than 1 - 2 e~2^'''p.2^p'+^^°s[''^(^+2'"')l 

Let us bound this probability first for 1 ^ p < oo. For m ^ 2*'+^ and /3~^ = p — 1, Lemma [l] 
(page 9 1 tells us that /ip^2 ^ ^j^i^pn^^ with fp = \/2 7r~^ r(2±i)p. A probability of success 1 — 77 with 
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r/ < 1 is then guaranteed if we select, for 1 ^ p < 2, 

m > log[ef (1 + 26-1)] + log 

2-p 

since Xp = m , and for 2 ^ p < oo, 

> ^(^)2(Klog[ef (l + 2e-i)]+log^), (20) 



since Ap = 1. 

From now, A^cB or A^cB means that there exists a constant c > such that these inequalities 
hold. According to the lower bound found in Section |vj Up > Ci/pn implying that ^ c. Since 
{p/{p - 1))2 ^ 4 for any J) ^ 2 and e'^ ^ ^I^^'^ ^ 6 6'^, we find the new sufficient conditions, 

m > c6-\^,f{K\og[e^{l + 12^1)] + log |), 

for 1 ^ p < 2, and 

m^/P > c5-2 (i^ log[ef (1 + 125-1)] + log |), 

for 2 ^ p < oo. 

Second, in the specific case where p = oo, since there exists a p > such that ^00,2 ^ p~^\/logm, 
with Aoo = 1, log m > c5-2(A'log[ef (1 + 125-1)] + log|). ■ 

Let us make some remarks about the results and the requirements of the last proposition. Notice first 
that for p = 2, we find the classical result proved in |[23l. Second, as for the comparison between the 
common RIP2,2 proof |23| and the tight bound found in ll24l . the requirements on the measurements 



above are possibly pessimistic, i.e., the exponent 2/p occurring in (20 1 is perhaps too small. Proposition 
[T] has however the merit to prove that random Gaussian matrices satisfy the RIPp,2 in a certain range of 
dimensionality. 

Appendix B 

Link between 6 and m for SGR RIPp,2 matrices 
For 2 ^ p < 00, Proposition [T] shows that, if 5^ ^ cm~'^/'P (inog[e^(l + 12^-1)] + log^) for 



a certain constant c > 0, a SGR matrix <I> G j^mxAf RIPp2 of order K and radius < 5 < 1 
with a probability higher than 1 — r/. Let us assume that b > dm^^/'P for some d > 0. We have, 
log < ^ logm — logd, and therefore, the same event occurs with the same probability bound when 
(5^ ^ cm~'^^P (iT log[13e^] + ^ logm — logd + log |). For high m and for fixed K, N and rj, this 
provides 5 = 0{m~^^P^ylog m), which meets the previous assumption. 
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Appendix C 

Proof of Lemma (Tf The result for p = oo is due to (see Eq (3. 14)). Let ^ G be a SGR 
vector, i.e., r^ua Af{0, 1) for 1 ^ i ^ m, and 1 ^ p < oo. First, the inequality E||^||p ^ (E||^||^)^/p 
follows from the application of the Jensen inequality (^(E||^||p) ^ E99(||^||p) with the convex function 

= {'Y- Second, the lower bound on E||^||p arises from the observation that for / : — )• IR+ with 
f{t) = fp, and for a given to > 0, 

fit) > f{to) + f'{to){t - to) + pf"{to){t - to)\ (21) 

for all t ^ 0. 

Indeed, observe first that since f^^^at') = ap'"" f'-''\t') for a > and n G N, it is sufficient to prove 
the result for to = I. Proving ( [2T] ) amounts then to prove f{t) = tp ^ ^^^^ — equivalently, 
tp^^ + ^-^t ^ ^IT^- "^^^ UlS of this last inequality takes its minimum in t = 1 with value which 
provides the result. 



Since /xp,2 = Elieilp = and E(||e||^ - fip,2) = 0, using ^ we find 

Mp,2 > (io)^~'((2 - l)flp,2to + (| - l)(Ap,2 + ^p)) 
writing i2p^2 = and = E(||^||p — /ip 2)^ = Var||^||p. The RHS of the last inequality is maximum 

for to = /^p,2 (1 + fip2 ^p)- ^'^^ ^^^^ value, we get finally 

^p,2 ^ (Eiieiip? (1 + {Eurp)-'Yarur/p-\ 

Because of the decorrelation of the components of ^, the last inequality simplifies into 

fip,2 > nip {E\g\P)p {1 + m-'^{E\g\P)-^Yav\g\P)p~^, 

with 5~AA(0, 1). 

Moreover, since E\g\P = 2^ tt~2 r(2±i) and using the following approximation of the Gamma function 

133 |r(x) - (^)^(f)1 ^ ^ (^)^(f )^ valid for x ^ 1, we observe that 

that holds also if x = 2+1 with p ^ 1. Therefore, (E|5|p)-2 Varl^l^ ^ {iw(^)Hj^)^ - l) ^ 
^(§)5 2f and finally 

/xp,2 ^ (E|gn? {1 + c^)p~' 
for a constant c = (|)2 < 1.584 < 2 independent of p and m. ■ 
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Appendix D 

Proof of Lemma |2| Notice first tiiat since J{\w) = A J{w) for any w G and A G M, it is 
sufficient to prove the result for ||'u||2 = ||f II2 = 1- 

The Lemma relies mainly on the geometrical properties of the Banach space = (R™, || • for 
p ^ 2. In ||35ll . ||36ll . it is explained that this space is p-convex and 2-smooth. The smoothness involves 
in particular 



+ y\\l ^ \\x\\l + 2{J{x),y) + {p - l)||y||2, (22) 



where J = J2 and Jr is the duality mapping of gauge function t ^ V ^ for r ^ 1. For the Hilbert space 
H.2, the relation (22 1 reduces of course to the polarization identity. For ip, Jr is the differential of 



i.e., {Jr{u))i = \\u\\^' ^ \ui\P ^signuj. 



The smoothness inequality p2| ) involves 

2{Jix),y) ^ \\x\\l + {p-l)\\y\\l-\\x-y\\l, (23) 

where we used the change of variable y — )• —y. 

Let us take x = and y = t^v with ||n||o = s, \\v\\o = s', \\u\\2 = ||f II2 = 1, suppu fl suppw = 
and for a certain t > that we will set later. Because <I> is assumed RIPp,2 for s, s' and s + s' sparse 
signals, we deduce 

2/i;J t \{J{^u),<^>v)\ ^ {l + 6s) + 

{p - + Ss')t^ - {l-6s+s'Kl + t^), 



where the absolute value on the inner product arises from the invariance of the RIP bound on ( [23] ) under 
the change y — )• —y. The value fj,~2\{J{^u), ^v) \ is thus bounded by an expression of type f{t) = 
with a,/3 > for p ^ 2 given by a = (5s + (5s+s' and (3 = {p — 2) + {p—l)Ss' + Ss+s'- Since the minimum 
of / is 2-y/a^, we get 

[{6s + ds+s'){p + pds' + Ss' + 6s+s')]K (24) 

with p = p — 2 ^ 0. 

In parallel, a change y ^ x + y in ^23\ provides 

2{J{x),y) ^ -\\x\\l + (p-l)||x + y||2- ||y||2, 

25 



where we used the fact that (J(x),x) = By summing this inequality with (23 1, we have 



p 

4.{J{x),y) ^ [p-2)\\y\\l + {p - I) \\x + y\\l - \\x - yfj,. 
Using the RIPp,2 on x = and y = t^v as above leads to 

+ (p - 1)(1 + 6s+s'){l + t") - (1 - 5s+s'){l + t") 

= p + p5 s+s' + {2p + p6s' +p6s+s')t'^, 
with the same argument as before to explain the absolute value. Minimizing over t as above gives 

2f,-l\{Ji^u),^v)\ ^ 

[{p + p6s+s'){2p + p5s' +p6s+s')]'- (25) 



Together, (24i and ([25]) imply 



Cp = mill {[i6s + 5s+s' ) {Ss' + 5s+s' + p (1 + 5s' )) ] % 

It is easy to check that Cp = Cp($, s, s') behaves as {^s + 5s+s') (1 + Ss')p for p » t)' ' ^^'^ 
6s+s' + l{l + Ss+s')P + 0{p'^) for pc^2. m 

Appendix E 

Proof of Theorem^ Let us write x* = x + h. We have to characterize the behavior of ||/i||2- In 
the following, for any vector n G M"^ with d G {m, N}, we define ua as the vector in M"' equal to u on 
the index set A C {1, • • • , d} and elsewhere. 

We define Tq = suppxx and a partition {Tk : 1 ^ k ^ \{N — K)/K~\} of the support of hTg. This 
partition is determined by ordering elements of h off of the support of xk in decreasing absolute value. 
We have \Tk\ = K for all A; ^ 1, n T^- = for k / k' , and crucially that \hj\ ^ \hi\ for all j G Tk+i 
and i G T/^. 

We start from 

\\h\\2 ^ \\hTo^h + \\hTsJ2, (26) 



26 



with Tqi = TqUTi, and we are going to bound separately the two terms of the RHS. In 112211 . it is proved 
that 

ll/iTo^Jb^ J]||/iTj|2^ IIHJI2 +2eo(K), (27) 



k>2 



with eo(K) = -4y^\\xT;r\\i- Therefore, 

yK oil 



I2 ^ 2\\hTj\2 +2eo(K). 
Let us bound now ||/itoiI|2 by using the RIPp,2- From the definition of the mapping J, we have 

By the Holder inequality with r = and s = p, 

{Ji<^hTj,<^h) ^\\Ji<^hTj\\r\m\s 

= \\^hToA\p\\'^h\\p ^ 2e||$/iToJ|p 

since ||^'/i||p ^ \\^x — y\\p + — y\\p ^ 2e. Using Lemma |2| as /itqi is 2K sparse and hx^ is K 
sparse, we know that, for k ^ 2, 

|(J($/lToJ,$/iT.>| ^ 4,2Cp\\hTj\2\\hn\\2, 



with Cp = Cp{^,2K,K), so that, using again the RIPp,2 of ^> and (27), 

(1-<J2k)/x^,2|IHiII2 ^ W^hrJll 

^ 2e/Xp,2(l + S2K)~^\\hTo^ h + /^p,2CpllHi lb J] 11% lb 

fe>2 

^ 2e^p,2(l + (52i^)^||%oill2 

+ /^p,2C^pl|/iToJ|2(||/iroJ|2 + 2eo(i^)). 

After some simplifications, we get finally 

ll^lb ^ eoiK) + ^. 
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Appendix F 

Proof of Lemma 3 ■ For a random variable u U{[—^, ^]), we compute easily that E|m|^ 



2' 2 J/' "''"^ "-'JiiiF"'-"^ "-'"''^'-J' ii^i "-I — 2P(p+l) 

22p(p+l)2(2p+l)- 



and Var|n|P = „2p( +iW2 +11 ■ Therefore, for a random vector ^ G M™ with components independent 



m. 



and identically distributed as u, = 2?^^m and Var||^||^ = 2^p{p+i)^2p+i) 

To prove the probabilistic inequality below Q, we define, for 1 ^ i ^ m, the positive random 
variables Zi = bounded on the interval [0, 1] with EZi = (p + 1)"^ Denoting 5" = ^ Xli ^i' ^e 

Chernoff-Hoeffding bound lHH tells us that, for t^O,F[S^ {p+ 1)-^ + t]^ e-^*'™. Therefore, 

which gives, for t = Km" 2, 

PiUFp > Cp + §^rn^ <: e-2-\ 
The limit value of (Cp + urn^Y/P when p — )• cx) is left to the reader. ■ 
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